Computational Psychiatry — Latest Matching Preprints

1

Beyond model-free Pavlovian responding: a two-stage Pavlovian-instrumental transfer paradigm

Wirth, L. A.; Sadedin, N.; Meder, B.; Schad, D. J.

2026-03-09 neuroscience 10.64898/2026.03.06.710018 medRxiv

Top 0.1%

23.0%

Show abstract

BackgroundPavlovian responding is a core component of behavior and can be measured via Pavlovian-instrumental transfer (PIT), where Pavlovian responses bias instrumental actions. Standard single-lever PIT paradigms, which assess responses using a single-choice option, cannot dissociate the contribution of model-free versus model-based reinforcement learning. While indirect evidence suggests a role for model-free responding in single-lever PIT, the contribution of model-based strategies is unclear. It also remains unknown whether internal cognitive states, such as mind wandering, impair specifically model-based but not model-free PIT, as is theoretically expected. MethodsWe developed a novel, trial-by-trial two-stage PIT paradigm designed to computationally dissociate model-free and model-based Pavlovian responding by leveraging probabilistic state transitions and trial-wise outcome predictions. After each two-stage Pavlovian learning trial, participants performed a single-lever PIT trial as well as a query trial of explicit value judgment. Detailed task instructions were provided to support potential model-based strategies. Computational modeling was used to quantify individual learning strategies. We assessed mind-wandering questionnaires and thought probes. ResultsAnalysis of query and PIT trials revealed trial-by-trial updating of outcome expectations based on probabilistic task structure, consistent with model-based Pavlovian responding. Behavioral responses during PIT were best explained by a computational model-based reinforcement learning model. In contrast, we found little evidence for model-free Pavlovian responding. Higher levels of mind wandering were associated with reduced model-based control but did not impact model-free indices. ConclusionWe introduce a novel single-lever PIT paradigm that enables fine-grained dissociation of model-free versus model-based Pavlovian response systems. Our findings provide evidence that single-lever PIT can operate through model-based mechanisms, challenging the assumption that single-lever PIT is predominantly model-free. Our findings also indicate that internal attentional states selectively modulate model-based PIT. Given the involvement of Pavlovian responding in numerous psychiatric conditions, our paradigm offers new avenues for understanding maladaptive behavior. Author SummaryOur daily actions are often influenced by cues like the smell of food or the sound of phone notifications that signal potential rewards or losses. These Pavlovian cues can shape our instrumental behavior even though their outcomes do not depend on what we do - a process known as Pavlovian-instrumental transfer (PIT). Here we study the computational learning mechanisms that underlie such PIT effects. While it is often assumed that Pavlovian responding follows simple, automatic rules without a cognitive model of cue consequences (i.e., model-free), evidence also shows a role for cognitive anticipations in Pavlovian responding (i.e., model-based). In this study, we extend this evidence by showing that PIT responding can be driven by flexible model-based learning. We designed a task to test whether participants use model-free versus model-based strategies to guide PIT, providing detailed task instructions. Using reinforcement learning models, we found that most participants used model-based learning when forming cue-outcome associations. Importantly, peoples attention mattered: when they were more distracted and doing mind wandering, they relied less on model-based strategies. Our findings suggest that Pavlovian learning is complex, flexible, and influenced by internal mental states, opening new windows to understand decision-making problems in mental health conditions like addiction.

2

Dynamic and Baseline Multi-Task Learning for Predicting Substance Use Initiation in the ABCD Study

Wei, M.; Zhang, H.; Peng, Q.

2026-04-13 addiction medicine 10.64898/2026.04.10.26350655 medRxiv

Top 0.1%

10.5%

Show abstract

BackgroundEarly initiation of substance use is linked to later adverse outcomes, and risk factors come from multiple domains and are shared across substances. In our previous work, traditional time-to-event Cox models identified individual risk factors, but these models are not designed to jointly model multiple outcomes or capture complex non-linear relationships. Multi-task learning (MTL) can leverage shared structure across related outcomes to improve prediction and distinguish common versus substance-specific predictors. However, most MTL studies rely on baseline features and focus on single outcomes, which limits their ability to capture shared risk and temporal changes. Substance use initiation is a time-dependent process that unfolds during development and reflects changing exposures over time. Baseline-only models cannot capture these changes or represent risk dynamics. Discrete-time modeling provides a practical approach by estimating interval-level initiation risk and combining it into cumulative risk at the subject level. By integrating multi-task learning with dynamic modeling, it is possible to share information across outcomes while capturing how risk evolves over time, which may improve prediction performance. MethodsUsing the Adolescent Brain Cognitive Development (ABCD) Study(R) (release 5.1), we developed two complementary multi-task learning (MTL) frameworks to predict initiation of alcohol, nicotine, cannabis, and any substance use. A baseline MTL model predicted fixedhorizon (48-month) initiation using one record per participant, while a dynamic discrete-time MTL model incorporated longitudinal interval data to model time-varying risk. Both models used multi-domain environmental exposures, core covariates, and polygenic risk scores (PRS). Performance was evaluated on a held-out test set using AUROC, PR-AUC, and calibration metrics, and compared with single-task logistic regression (LR). Feature importance was assessed using permutation importance and compared with Cox proportional hazards models. ResultsMTL showed comparable or improved performance relative to LR, with larger gains for low-prevalence outcomes (cannabis and nicotine). Incorporating longitudinal information led to consistent improvements across all outcomes. Dynamic models increased AUROC by +0.044 to +0.062 for MTL and +0.050 to +0.084 for LR, indicating that temporal information was the primary driver of performance gains. Feature importance analyses showed modest overlap across methods, with higher agreement between dynamic MTL and Cox models than static MTL. A small set of features, including externalizing behavior, parental monitoring, and developmental factors, were consistently identified across all approaches. ConclusionsDynamic multi-task learning improves the prediction of substance use initiation by leveraging longitudinal structure and shared information across outcomes. While MTL provides additional gains, incorporating time-varying information is the dominant factor for improving performance. Combining baseline and dynamic frameworks offers a comprehensive strategy for identifying robust risk factors and modeling adolescent substance use initiation.

3

GAMBIT: A Digital Tool to Train Distinct Inhibitory Control Mechanisms

Dirupo, G.; Westwater, M. L.; Khaikin, S.; Feder, A.; DePierro, J. M.; Charney, D. S.; Murrough, J. W.; Morris, L. S.

2026-03-06 psychiatry and clinical psychology 10.64898/2026.03.05.26347639 medRxiv

Top 0.1%

9.7%

Show abstract

Deficits in inhibitory control are common across a wide range of psychiatric disorders and are closely linked to symptom severity, including emotional dysregulation, anxiety, substance misuse, and self-harm, making them an appealing target for intervention. Cognitive training offers a low-cost, scalable, and non-invasive strategy to strengthen inhibitory control; however, most existing paradigms target only a single facet of inhibition and rarely account for environmental influences, such as affective context. To address these gaps, we developed a computerized inhibitory control training paradigm to simultaneously engage three components of inhibition: preemptive, proactive, and reactive, while embedding trials within positive and negative affective contexts to assess the impact of emotional stimuli. Across two online experiments, participants completed the GAMBIT task in one session (Experiment 1, N = 300) or repeated over three sessions (Experiment 2, N = 65). The task included No-Go trials to train preemptive inhibition, stop-signal trials for reactive inhibition, and stop-signal anticipation trials to train proactive inhibition. Affective images of differing valence were presented as background stimuli to evaluate their impact on inhibitory performance. In Experiment 1, participants showed higher accuracy on No-Go versus reference Go trials ({beta}=1.45, SE=0.09, p<.001), confirming successful manipulation of preemptive inhibition. Reaction times were slower during anticipation trials across two different conditions ({beta}=0.16, SE=0.04, p<.001; {beta} = 0.07, SE = 0.04, p = 0.047), consistent with proactive slowing when anticipating a potential stop signal. Additionally, positive affective images ({beta} = 0.10, SE= 0.009, p < 0.001) further slowed RTs, indicating emotional interference with proactive control. In Experiment 2, the pattern of higher No-Go accuracy was replicated ({beta} = 0.91, SE = 0.11, p < .001) and accuracy generally improved over sessions ({beta} = 0.38, SE = 0.06, p < .001). In anticipation trials, RTs become shorter across sessions (session 2: {beta} = -0.25, SE = 0.06, p < .001; session 3: {beta} = -0.45, SE = 0.06, p < .001), reflecting practice-related gains, and SSRTs decreased over time (F(2,56) = 6.26, p = .004), consistent with enhanced reactive inhibition. Proactive inhibition was modulated by affective images, with both negative ({beta} = 0.04, SE = 0.02, p = .039) and positive ({beta} = 0.16, SE = 0.02, p < .001) affective images associated with slower RTs. Participants also reported reductions in self-assessed temper control by the last session (W = 25.5, p = .007, q = .037, d = -0.51) and usability ratings were high (all means [≥] 3.87/5). Together, these findings show that this paradigm recruits multiple forms of inhibitory control and yields training-related improvements in both performance and affective outcomes. This provides preliminary validation of a scalable, fully online inhibitory control training tool targeting multiple dissociable inhibitory processes within affective contexts. The approach holds promise as an accessible transdiagnostic intervention to support symptom improvement across psychiatric disorders, with future work needed to evaluate clinical efficacy in patient populations.

4

Reconciling neurocognitive and behavioral impulsivities through ecological assessment and multivariate modelling of cognitive control dynamics

imparato, a.; Reich, N.; Riviere, G.; Eliez, S.; Graser, C.; Schneider, M.; Sandini, C.

2026-04-28 psychiatry and clinical psychology 10.64898/2026.04.27.26351677 medRxiv

Top 0.1%

6.1%

Show abstract

Impulsivity is a core dimension of ADHD and a transdiagnostic vulnerability factor for a wide range of adverse psychiatric and somatic outcomes, that could be mitigated through more effective screening of at-risk individuals. However, laboratory-based measures of impulsivity show weak convergence across paradigms and limited prediction of real-world behavior, constraining their utility. We tested whether combining repeated ecological assessment with computational modeling of response-time (RT) dynamics improves measurement of impulsivity and its cross-paradigm validity. Sixty participants, including adolescents with ADHD, individuals with 22q11.2 deletion syndrome, and healthy controls, completed a total of 1347 smartphone-based Balloon-Analogue-Risk-Task (D-BART) assessments repeatedly in daily life, alongside a single-session Conners CPT-3. RT was modeled using linear mixed-effects models as a function of objective risk and subjective uncertainty, with random effects capturing between- and within-person variability. Dynamic RT parameters were integrated with conventional performance metrics and related to CPT-3 variables using partial least squares analysis. External validity was evaluated against parent-rated behavioral symptoms. RT increased with both risk and uncertainty, consistent with adaptive modulation of speed-accuracy trade-offs. These effects varied substantially across individuals and repeated assessments. Dynamic RT parameters differentiated clinical from control participants, whereas traditional aggregate metrics did not. A PLS latent component linked D-BART and CPT-3 patterns and was associated with real-world hyperactivity/impulsivity, whereas CPT-3-derived scores alone were not. Experimental manipulation of ecological sampling density directly impacted D-BART predictive accuracy. These findings show that ecological repetition combined with parsimonious RT-dynamics modeling enhances construct validity, cross-paradigm convergence, and behavioral relevance of impulsivity measures, providing a scalable framework for capturing dynamic cognitive-control processes.

5

Data Diversity vs. Model Complexity in the Prediction of Pediatric Bipolar Disorder: Evidence from Academic and Community Clinical Samples

Shi, Z.; Youngstrom, E. A.; Liu, Y.; Youngstrom, J. K.; Findling, R. L.

2026-03-27 psychiatry and clinical psychology 10.64898/2026.03.26.26349447 medRxiv

Top 0.1%

2.7%

Show abstract

Pediatric bipolar disorder is challenging to diagnose accurately due to symptom heterogeneity. More standardized and data-driven approaches are needed to enhance diagnostic reliability. We evaluated a clinical decision tool (nomogram), statistical methods (logistic regression, LASSO), machine learning (support vector machine, random forest, k-nearest neighbors, extreme gradient boosting), and deep learning model (multilayer perceptron) for pediatric bipolar disorder prediction across two datasets collected in academic (N=550) and community (N=511) clinical settings. We compared three modeling strategies: cross-dataset validation, cross-dataset with interaction terms, and mixed-dataset. We assessed model performance using discrimination ability, calibration, and predictor importance ranking. In the baseline cross-dataset approach, all models showed good internal discrimination in the academic dataset; but external discrimination in the community dataset substantially declined. Interaction-enhanced models slightly improved internal discrimination but not external performance or calibration. Recalibration prominently improved cross-dataset calibration without compromising discrimination, indicating that transportability problems were largely driven by probability scaling. Models trained on mixed datasets exhibited much stronger external discrimination and calibration. Across models and training strategies, family risk and PGBI-10M were consistently ranked as the most important predictors. Predictive models for pediatric bipolar disorder showed strong internal performance but limited cross-setting generalizability due to dataset shift and miscalibration. Increasing model complexity did not improve external performance, whereas training on pooled data substantially improved both discrimination and calibration. Findings suggest that sampling diversity, rather than model complexity, is more valuable for developing clinically useful and generalizable psychiatric prediction models, underscoring the importance of open and collaborative datasets.

6

A Machine Learning Based Causal Interface for Time-Varying Environmental Predictors of Substance Use Initiation in the ABCD Study

Wei, M.; Yadlapati, L.; Peng, Q.

2026-04-17 addiction medicine 10.64898/2026.04.15.26350988 medRxiv

Top 0.1%

2.1%

Show abstract

BackgroundThe Adolescent Brain Cognitive Development (ABCD) Study(R) offers rich longitudinal data on environmental, genetic, and other factors related to substance use initiation. Classical marginal structural models (MSMs) require selecting covariates for propensity models, which is challenging in the presence of hundreds of correlated predictors. MethodsWe analyzed longitudinal panel data from 11,868 ABCD participants, where each individual contributed repeated observations over time. Interval-level binary outcomes were defined for initiation of alcohol, nicotine, cannabis, and any substance, restricting analyses to participants at risk prior to initiation. All predictors were constructed as lagged variables to preserve temporal ordering. We implemented a two-stage machine learning-based causal framework. First, we performed graph discovery using a Granger-inspired lagged predictive modeling approach, applying elastic-net logistic regression to identify predictive relationships between lagged environmental variables and future initiation outcomes. Robust candidate edges were selected using subject-level bootstrap stability selection. Second, we estimated adjusted effect sizes for stable edges using double machine learning (DML)-style partialling-out with cross-fitting. For each candidate predictor, the treatment was defined as the lagged variable of interest and adjusted for high-dimensional lagged covariates. Cross-fitting with group-based splitting accounted for within-subject dependence, and nuisance functions were estimated using random forest models. Cluster-robust standard errors were used for inference. ResultsWe identified a set of stable predictors across multiple domains, including sleep patterns, family environment, peer relationships, behavioral traits, and genetic risk. Many predictors were shared across substance outcomes, while some were outcome-specific. Estimated effect sizes were modest, typically ranging from -0.01 to 0.02 per standard deviation increase in the predictor. Both risk-increasing and protective associations were observed. Risk factors included sleep disturbance and behavioral risk indicators, while protective factors included parental monitoring and structured environments. ConclusionsThis study provides a practical framework for analyzing high-dimensional longitudinal data and identifying time-varying predictors of substance use initiation. The approach combines machine learning for variable selection with causal inference methods for effect estimation. The results highlight both shared and substance-specific risk factors and identify modifiable targets, such as family environment and sleep, that may inform prevention strategies.

7

Therapist-Delivered Video CBT for Hoarding Disorder: A Retrospective Observational Study of Clinical Outcomes from a Large Real-World Sample of Adults

Beatty, C.; Feusner, J. D.; McGrath, P. B.; Farrell, N. R.; Nunez, M.; Lume, N.; Trusky, L.; Smith, S. M.; Rhode, A.

2026-05-19 psychiatry and clinical psychology 10.64898/2026.05.14.26353262 medRxiv

Top 0.1%

2.1%

Show abstract

Hoarding disorder (HD) affects approximately 2-3% of adults and is associated with substantial functional disability and limited access to evidence-based care. The aim of the current analysis was to examine the naturalistic effectiveness of therapist-delivered video cognitive-behavioral therapy (CBT) for HD in a large real-world sample, and to characterize individual-level treatment response, time-to-response, and moderators of outcome. This retrospective, observational analysis examined clinical data from 305 adults diagnosed with HD who received therapist-delivered video CBT through an online specialty therapy platform between September 2021 and February 2026. Hoarding symptom severity was assessed using the Hoarding Rating Scale-Self Report (HRS-SR). Linear mixed models examined symptom change from baseline to three timepoints: session 10, session 20, and each patient's final session. HRS-SR scores decreased from M = 22.4 (SD = 7.6) at baseline to M = 16.4 (SD = 8.2) at final session (Hedges' g = 0.81, 95% CI: 0.68-0.94). By the final session, median percent improvement was 25.0% [IQR: 3.0-46.7%]. A total of 39.3% of patients achieved [≥]35% HRS-SR reduction, 27.4% of patients who began above the clinical threshold achieved remission, 36.4% demonstrated reliable improvement, and 22.9% of eligible patients achieved clinically significant change. Among patients who achieved and maintained [≥]35% reduction through their final session (n = 120), median time to first response was session 9, with 54.2% responding within 10 sessions. Analyses of secondary outcomes showed significant improvements in clutter severity, depressive and anxiety symptoms, stress, quality of life, and functional disability (Hedges' g = 0.21-0.47). Greater baseline severity, more sessions, and longer treatment duration significantly moderated outcomes; prior OCD treatment history did not. Findings suggest that therapist-delivered video CBT for HD, delivered remotely in a real-world setting, produces outcomes consistent with controlled trials and may be a clinically effective and scalable approach for a condition historically underserved by mental health systems.

8

The Impact of Cognitive Load and Encoding Strategies on Prospective Memory in Children with ADHD: Performance and Processing Differences

Huang, J.; Lin, Z.; Wu, X.; Ye, Z.; Dong, Y.; Pan, Y.

2026-05-17 psychiatry and clinical psychology 10.64898/2026.05.12.26353075 medRxiv

Top 0.1%

1.9%

Show abstract

I ntroduction: Prospective memory (PM) deficits in children with attention-deficit/hyperactivity disorder (ADHD) significantly impact academic and daily functioning. Through two experiments, this study investigated how cognitive load and encoding strategies modulate PM performance. Methods: Experiment 1 included 43 children (21 ADHD, 22 typically developing) who completed an n-back task under high and low cognitive load. Experiment 2 included 44 children with ADHD who were randomly assigned to either a standard encoding group or an implementation intention encoding group, also completing the n-back task under both load conditions. Results: Experiment 1 showed that children with ADHD had significantly lower PM accuracy than typically developing peers. Signal detection analysis revealed that this deficit stemmed from a more conservative response bias rather than impaired perceptual sensitivity. Unexpectedly, PM accuracy and perceptual sensitivity were higher under high cognitive load than low load for both groups. Experiment 2 demonstrated that implementation intention encoding significantly enhanced PM accuracy and perceptual sensitivity in children with ADHD, with stable effects across load conditions and no interference with ongoing task performance. Discussion: These findings indicate that PM deficits in children with ADHD reflect a conservative response strategy rather than an inability to detect target cues. Implementation intention encoding provides an effective, load-independent cognitive strategy for enhancing PM performance. These results offer novel insights into the cognitive mechanisms underlying PM deficits in ADHD and provide evidence-based guidance for targeted interventions.

9

Anxiety Sensitivity as a Mediator of Internet-Based Cognitive Behavioral Therapy for Panic Disorder: A Randomized Controlled Trial with Minimal Therapist Contact

Orrego, J.; Raich, R. M.

2026-05-17 psychiatry and clinical psychology 10.64898/2026.05.13.26353032 medRxiv

Top 0.1%

1.9%

Show abstract

Background: Internet-based cognitive behavioral therapy (iCBT) is efficacious for panic disorder (PD), yet the mechanisms of change remain underspecified. Anxiety sensitivity (AS) is theoretically central to PD maintenance, but its role as a mediator has not been formally tested in Spanish-speaking populations using minimal-contact formats. This study evaluates the efficacy of the "Free from Anxiety" iCBT program and examines AS as a mediator of clinical outcomes. Methods: In a randomized controlled trial, 95 adults meeting DSM-IV-TR criteria for PD were assigned to an 8-week iCBT program with optional email support (n = 49) or a waiting-list control (n = 46). Primary outcome was PD severity (PDSS); secondary outcomes included anxiety sensitivity (ASI-3), general anxiety (BAI), and depression (BDI-II). Mediation was assessed via Baron and Kenny's framework with bootstrapping (5,000 resamples) to estimate the indirect effect of ASI-3 change on PDSS reduction. Results: The treatment group showed significant improvements across all measures compared to controls (PDSS: d = 0.76, 95% CI [0.10, 1.42]; mean d = 1.30). Mediation analysis confirmed that ASI-3 change partially mediated the treatment effect on PDSS (indirect effect = 1.85, 95% CI [0.36, 3.70]), accounting for 27.4% of the total effect. The direct effect remained significant (b = 4.89, p < .001). Intent-to-treat (ITT) analyses supported robustness (d = 0.47 to 1.47). Gains were maintained at 6-month follow-up (d = 1.19 to 1.26). Conclusions: iCBT reduces anxiety sensitivity as a partial mechanism of change, aligning with cognitive models of panic. These findings support Free from Anxiety as an evidence-based, viable first-step intervention for Spanish-speaking clinical populations within stepped-care pathways.

10

Cognitive Flexibility and Decision-Making in Anxiety and Depression: Meta-Analytic Evidence Facilitated by Machine-Learning Screening

Balcazar, J.; Albanese, B.; Rymer, T.; Davis, M.; Campos, S.; Polimerou, M.; Abel, E.; Shapley, J.; Algranatti, I.; Wood, H.; Smith, H.; Hankamer, K.; Orr, J.

2026-05-18 psychiatry and clinical psychology 10.64898/2026.05.14.26353209 medRxiv

Top 0.1%

1.8%

Show abstract

The ability to adjust to changing environments (cognitive flexibility) and optimal decision-making are pivotal brain functions that govern successful human behavior. Anxiety and depressive disorders are strongly pervasive psychiatric conditions across the lifespan that profoundly disrupt mechanisms of attention, working memory, and decision-making. Although existing task evidence documents impaired decision-making and flexibility outcomes for both anxiety and depression, there is a growing need to systematically evaluate the role of anxiety and depression and to quantitatively compare the effects of these disorders on these domains. In the present study, we conducted a meta-analysis of anxiety and depression on decision-making and cognitive flexibility. We utilized a random-effects approach, given that a large amount of between-subject heterogeneity was anticipated. Given the scope of this meta-analysis, we used the machine learning tool asReview to more efficiently conduct a meta-analytic search. Across all outcomes, results showed anxiety and depression were associated with reduced cognitive flexibility and decision-making. These effect sizes were then tested for significance using a fixed-effects (plural) model. Subgroup analyses revealed no significant differences between anxiety and depression for either decision-making or flexibility outcomes, consistent with a transdiagnostic perspective. Results are contextualized in light of the biopsychosocial model and potential transdiagnostic factors.

11

Identification of Suicide-Related Subgroups Using Latent Class Analysis: Complementary Insights to Explainable AI-Based Classification

Kizilaslan, B.; Mehlum, L.

2026-03-27 psychiatry and clinical psychology 10.64898/2026.03.25.26349264 medRxiv

Top 0.1%

1.8%

Show abstract

Purpose: Suicide and self-harm are major public health concerns characterized by substantial clinical and psychosocial heterogeneity. While latent class analysis has been used to identify subgroups of people with suicidal behavior, the extent to which such population-level phenotyping complements explainable artificial intelligence-based classification models remain unclear. Methods: We applied latent class analysis to a cross-sectional, publicly available dataset of 1000 individuals presenting with self-harm and suicide-related behaviors at Colombo South Teaching Hospital, Kalubowila, Sri Lanka. Sociodemographic, psychosocial, and clinical variables were used to identify latent subgroups. Class characteristics and suicide prevalence were examined and compared with variable importance patterns reported in a previously published explainable artificial intelligence (XAI)-based suicide classification study using the same dataset. Results: Four latent classes were identified. Two classes exhibited very high suicide prevalence (91.2% [95% CI: 87.7-93.8] and 99.0% [95% CI: 96.4-99.7]), whereas two classes showed low prevalence (<1%). The two high-prevalence classes differed markedly in lifetime psychiatric hospitalization history, with one class showing a 100% prevalence of prior hospitalization and the other substantially lower hospitalization rates. These patterns partially aligned with, and extended beyond, variable importance findings from the XAI-based model. Conclusion: Latent class analysis identified distinct subgroups with substantially different suicide prevalence and clinical profiles, underscoring the heterogeneity of individuals presenting with self-harm. Comparison with XAI-based suicide classification model findings suggest that unsupervised phenotyping and supervised classification provide complementary perspectives, offering population-level context that may enhance the interpretability of suicide assessment frameworks. Keywords: suicide; self-harm; latent class analysis; explainable artificial intelligence; machine learning

12

Benchmarking Language Models for Clinical Safety: A Primer for Mental Health Professionals

Flathers, M.; Nguyen, P. A. H.; Herpertz, J.; Granof, M.; Ryan, S. J.; Wentworth, L.; Moutier, C. Y.; Torous, J.

2026-03-23 psychiatry and clinical psychology 10.64898/2026.03.20.26348900 medRxiv

Top 0.1%

1.7%

Show abstract

BackgroundMillions of people use language models to discuss mental health concerns, including suicidal ideation, but limited frameworks exist for evaluating whether these systems respond safely. Benchmarking, the practice of administering standardized assessments to language models, offers direct parallels to clinical competency evaluation, yet few clinicians are involved in designing, validating, or interpreting these assessments. AimsTo introduce mental health professionals to benchmarking language models by administering a validated clinical instrument and demonstrating how configuration decisions, measurement limitations, and scoring context affect result interpretation. MethodWe administered the Suicide Intervention Response Inventory (SIRI-2) programmatically to nine commercially available language models from three providers. Each item was presented 60 times per model (three prompt variants x two temperature settings x 10 repetitions), yielding 27,000 model responses compared against point-in-time expert consensus. ResultsTotal scores ranged from 19.5 to 84.0 (expert panel baseline: 32.5). Prompt design alone shifted individual model scores by as much as the difference between trained and untrained human groups. The best performing model approached the instruments measurement floor. All nine models consistently overrated clinically inappropriate responses that sounded supportive. ConclusionsA single benchmark score can support markedly different claims depending on the assumed standard of clinical behavior, the instruments remaining measurement range, and the configuration that produced the result. The skills required to make these distinctions must become core competencies. Benchmark results are increasingly utilized to support claims about mental health safety that may not be accurate, making it necessary to close the gap between clinical measurement and AI. Plain Language SummaryAI chatbots like ChatGPT, Claude, and Gemini are increasingly used by millions of people to discuss mental health problems, including thoughts of suicide. To assess whether these systems handle such conversations safely, researchers give them standardized tests called benchmarks and compare their answers to those of human experts. These scores are already used to argue AI systems are ready for clinical use. This study gave a well-established test of suicide response skills to nine AI models from three major companies under varying conditions. We changed how much instruction the AI received and how much randomness was built into its responses, then measured whether the scores changed. The same AI model could score like a trained crisis counselor under one set of conditions and like an untrained undergraduate under another, depending on choices the person running the test made. Every model also made the same kind of mistake: responses that sounded warm and caring were rated as appropriate, even when experts had judged them to be clinically problematic. The highest-scoring model performed so well that the test could no longer measure whether it was truly skilled or had simply exceeded the tests range. These findings show that a single score can be misleading without knowing how the test was run, whether it can still distinguish strong from weak performance, and whether it matches what the AI is used for. Mental health professionals routinely make these judgments about clinical assessments and are well positioned to bring that expertise to AI evaluation.

13

Explaining temporally clustered errors with an autocorrelated Drift Diffusion Model

Vloeberghs, R.; Tuerlinckx, F.; Urai, A. E.; Desender, K.

2026-03-23 neuroscience 10.64898/2026.03.20.713186 medRxiv

Top 0.1%

1.7%

Show abstract

A widely used framework for studying the computational mechanisms of decision making is the Drift Diffusion Model (DDM). To account for the presence of both fast and slow errors in empirical data, the DDM incorporates across-trial variability in parameters such as the drift rate and the starting point. Although these variability parameters enable the model to reproduce both fast and slow errors, they rely on the assumption that over trials each parameter is independently sampled. As a result, the DDM effectively predicts that errors-- whether fast or slow--occur randomly over time. However, in empirical data this assumption is violated, as error responses are often temporally clustered. To address this limitation, we introduce the autocorrelated DDM, in which trial-to-trial fluctuations in drift rate, starting point, and boundary evolve according to first-order autoregressive (AR1) processes. Using simulations, we demonstrate that, unlike the across-trial variability DDM, the autocorrelated DDM naturally accounts for temporal clustering of errors. We further show that model parameters can be reliably recovered using Amortized Bayesian Inference, even with as few as 500 trials. Finally, fits to empirical data indicate that the autocorrelated DDM provides the best account of error clustering, highlighting that computational parameters fluctuate over time, despite typically being estimated as fixed across trials.

14

Predicting Impulsive Choices: Development of a Novel Experimental Task

Ma, H.; Fennema, D.; Simblett, S.; Zahn, R.

2026-03-12 psychiatry and clinical psychology 10.64898/2026.03.11.26348147 medRxiv

Top 0.1%

1.7%

Show abstract

AimsDue to the multifaceted nature of "impulsivity", its measurement remains fragmented. Here, we developed the Risky Social Choices task to provide evidence for its validity and reliability, while testing the hypothesis that impaired access to implicit knowledge of negative long-term consequences is of distinct importance for "impulsive" decision-making in a general population sample. MethodsForty participants chose whether to engage in risk-taking behaviors, which combined web-based AI-generated videos with narrated hypothetical scenarios and measured worries related to negative long-term consequences, approach-related motivation for short-term rewards, response time to and accuracy of recognizing degraded auditory prime words denoting negative long-term consequences. ResultsA pre-registered multi-step regression model was constructed with worry, motivation, response time and accuracy as predictors and percentage of risky choices as the outcome. Among all predictors, only prime word recognition accuracy was significantly negatively associated with risky choices, confirming our hypothesis of the role of reduced implicit access to negative long-term consequences in risk-taking decisions. In contrast, approach-related motivation for rewards was the only predictor significantly positively related to percentage of risky choices. DiscussionAs predicted, the negative association between risky choices and implicit access to negative long-term consequences supports its role as a distinct aspect of "impulsivity". The novel task successfully captured this aspect, paving the way for a more precise neurocognitive characterization of clinical conditions where "impulsivity" plays a key role. The findings unveil the importance of implicit social sequential knowledge for impulsivity in neurotypical populations, so far only investigated in patients with brain lesions.

15

Toward trustworthy clinical AI for obsessive-compulsive disorder: reliability, generalizability, and interpretability of a transformer model across the ENIGMA-OCD consortium

Pak, M.; Ryu, Y.; Bae, S.; Anticevic, A.; Costa, A. D.; Thorsen, A. L.; van der Straten, A. L.; Couto, B.; Vai, B.; Hansen, B.; Soriano-Mas, C.; Li, C.-s. R.; Vriend, C.; Lochner, C.; Pittenger, C.; Moreau, C. A.; Rodriguez-Manrique, D.; Vecchio, D.; Shimizu, E.; Stern, E. R.; Munoz-Moreno, E.; Nurmi, E. L.; Piras, F.; Colombo, F.; Piras, F.; Jaspers-Fayer, F.; Benedetti, F.; Venkatasubramanian, G.; Eng, G. K.; Simpson, H. B.; Ruan, H.; Hu, H.; van Marle, H. J. F.; Tomiyama, H.; Martinez-Zalacain, I.; Feusner, J.; Narayanaswamy, J. C.; Yun, J.-Y.; Sato, J. R.; Ipser, J.; Pariente, J. C.; Mench

2026-04-27 psychiatry and clinical psychology 10.64898/2026.04.24.26351711 medRxiv

Top 0.1%

1.5%

Show abstract

BackgroundStudies applying machine learning to obsessive-compulsive disorder (OCD) typically report accuracy in homogeneous samples but rarely assess model reliability, generalizability, and interpretability needed for clinical use. MethodsWe applied a transformer-based deep learning model, the Multi-Band Brain Net, to the ENIGMA-OCD cohort - the largest available resting-state functional magnetic resonance imaging (rs-fMRI) dataset in OCD with 1,706 participants (869 cases with OCD, 837 controls) across 23 sites worldwide. We evaluated model reliability by calculating calibration - the models ability to "know what it doesnt know". We assessed generalizability using leave-one-site-out validation to test performance on unseen sites with different scanners, acquisition protocols, and patient populations. Finally, we examined interpretability by analyzing model attention weights to identify the neural connectivity patterns that influence model predictions. ResultsThe model achieved modest but competitive classification performance (AUROC = .653 {+/-} .039). Crucially, while large-scale pretraining on the UK Biobank (N = 40,783) did not boost accuracy, it significantly enhanced model calibration by reducing overconfident predictions. Leave-one-site-out validation showed a generalization gap across sites (AUROC = .427-.819). Pretraining did not close this gap but removed scanner manufacturer bias. Finally, attention-based mapping identified biologically plausible patterns of widespread hypoconnectivity in OCD relative to healthy controls, particularly in low-frequency bands involving the default mode, salience, and somatomotor networks. These findings aligned with known OCD neurobiology. ConclusionsThis study provides a framework for developing more reliable and trustworthy clinical artificial intelligence for OCD.

16

Policy precision reveals action-phase impulsivity in women with premenstrual syndrome during risk-taking

Jeong, B.; Yoon, D.

2026-03-16 neuroscience 10.64898/2026.03.12.711243 medRxiv

Top 0.1%

1.5%

Show abstract

The Balloon Analogue Risk Task (BART) is widely used to assess risk-taking and impulsivity, yet existing computational models struggle to unify sequential and prior evaluation strategies or fully capture uncertainty-driven information-seeking behavior. To address this, we introduce a novel computational framework grounded in the Active Inference Framework (AIF), which conceptualizes behavior as the minimization of expected free energy. Model comparisons demonstrate that AIF-based models statistically outperform existing benchmarks. Furthermore, we applied this framework to investigate impulsivity in women with Premenstrual Syndrome (PMS). Our model revealed that the PMS group exhibited significantly higher values in inverse precision of policy ({beta}0) and the phase difference of this parameter was only observed in PMS group. This suggests that high {beta}0 serves as a robust computational marker, reflecting both the trait impulsivity inherent in PMS and its state-like exacerbation across the menstrual cycle. Lastly, our findings indicate that impulsivity in PMS manifests not as a learning deficit, but as heightened sensitivity to trial-by-trial sequential evaluation at the expense of stable, pre-planned prior policies. This framework provides a neurobiologically plausible and mechanistically granular understanding of risk-taking, offering new avenues for computational psychiatry.

17

Predicting cognitive-behavioral therapy outcomes in obsessive-compulsive disorder from inhibitory control neural activity: A mega-analysis and machine learning study from the ENIGMA-OCD consortium

Dzinalija, N.; van den Heuvel, O. A.; Simpson, H. B.; Ivanov, I.; Alonso, P.; Bertolin, S.; Bruin, W.; Fortea, L.; Fullana, M. A.; Hagen, K.; Hansen, B.; Huijser, C.; Kvale, G.; Martinez-Zalacain, I.; Menchon, J. M.; Ousdal, O. T.; Soriano-Mas, C.; van der Straten, A. L.; Thomopoulos, S. I.; Thorsen, A. L.; Vilajosana, E.; ENIGMA-OCD Consortium, ; Stein, D. J.; Thompson, P. M.; Veer, I. M.; Vriend, C.; van de Mortel, L. A.

2026-03-15 psychiatry and clinical psychology 10.64898/2026.03.13.26348316 medRxiv

Top 0.1%

1.3%

Show abstract

ObjectiveCognitive behavioral therapy (CBT) is an effective first-line treatment for obsessive-compulsive disorder (OCD), yet it remains difficult to predict who will respond to this intervention. This study investigates associations between neural activity during inhibitory control tasks and CBT outcomes, and whether task-based fMRI data could serve as a predictive marker of individual CBT response. MethodsUsing fMRI data from individuals performing an inhibitory control task across five samples (n=130, age range=8-57, 54% female) of the ENIGMA-OCD consortium, univariate associations were analyzed between activity during response inhibition and error processing and three CBT outcomes: response, remission, and pre-post treatment change in symptom severity. Random forest and support vector machine models using leave-one-sample-out cross-validation were used for prediction of CBT response and remission from fMRI activity and clinical data. ResultsRemission after CBT was associated with weaker activity in default mode regions during response inhibition and in the right supramarginal gyrus during error processing. Greater symptom reduction was linked to weaker pre-treatment activity across frontoparietal, dorsal attention, visual, and subcortical regions during response inhibition, but to stronger default mode activity during error processing. Despite these robust group-level effects, machine learning models failed to predict individual outcomes above chance level with either neuroimaging or clinical data. ConclusionWeaker activity during response inhibition in a widespread network, as well as stronger activity in default mode regions during error processing before treatment, appear beneficial to CBT response. However, these findings cannot yet be translated into individually predictive markers of CBT outcome.

18

Disentangling the Relationship Between Mind Wandering and Symptom Dimensions in a Non-Clinical Sample: ADHD as the Primary Driver

Likar, M.; Brezoczki, B.; Vekony, T.; Simor, P.; Nemeth, D.

2026-03-18 neuroscience 10.64898/2026.03.16.712037 medRxiv

Top 0.1%

1.3%

Show abstract

Mind wandering has been linked to a wide range of psychiatric conditions, yet most studies have examined these associations in isolation. Given the substantial comorbidity across the psychopathological spectrum, it remains unclear whether elevated mind wandering reflects a general marker of psychopathology or a more specific attentional-control deficit shared across symptom dimensions. To address this, we adopted a dimensional, transdiagnostic approach in a non-clinical sample (N = 376), simultaneously modeling seven symptom dimensions: ADHD, depression, obsessive-compulsive tendencies, schizotypy, autistic traits, hypomania, and eating disorder symptoms. At the bivariate level, mind wandering correlated positively with all symptom dimensions. However, when the substantial shared variance across dimensions was accounted for in both frequentist and Bayesian multivariate regression models, only ADHD symptoms emerged as a unique predictor ({beta} = 0.53; BF{square}{square} > 1000), with all remaining predictors yielding negligible unique contributions and Bayes factors supporting the null hypothesis. These findings suggest that previously reported associations between mind wandering and diverse psychopathological symptom dimensions largely reflect a shared liability with ADHD-related attentional dysregulation, rather than disorder-specific mechanisms. This positions mind wandering as a marker of attentional dysregulation more closely tied to ADHD symptomatology than to general psychopathological burden.

19

Predicting Substance Use and Psychotic-Like Experiences in Adolescents

Amir, C.; Walsh, C.; Wang, H.; Ghahremani, D.; Chang, S.; Ho, T.; Uddin, L.; Cooper, Z.; Rissman, J.; Bearden, C.

2026-05-22 psychiatry and clinical psychology 10.64898/2026.05.20.26353709 medRxiv

Top 0.1%

1.3%

Show abstract

Adolescence is a critical developmental window for the emergence of substance use and psychosis-spectrum symptoms, yet early risk for these outcomes remains poorly understood. Using longitudinal data from the Adolescent Brain Cognitive Development (ABCD) Study (n=10,134), we tested whether demographic, clinical, and structural and functional neuroimaging measures assessed in childhood (mean baseline age=9.96 years) predict later adolescent substance use, psychotic-like experiences, and/or their co-occurrence. Multivariate machine learning models reliably predicted later emergence of psychotic-like experiences (AUROC=0.780) and their co-occurrence with substance use (AUROC= 0.828), as well as substance use on its own (AUROC=0.626). Distinct patterns of functional brain connectivity, task-related brain activation, demographic, and clinical factors differentiated each outcome. Findings suggest that partially dissociable developmental risk profiles are detectable as early as childhood, and results underscore the importance of explicitly modeling comorbidity when interrogating risk factors for mental health outcomes.

20

Impact of AI-Powered Cognitive Behavioral Therapy Chatbot Access on Anxiety and Depressive Symptoms Among Primary Care Patients in Brazil: A Fuzzy Regression Discontinuity Design

Ferreira, C.; Lim, A.

2026-04-02 psychiatry and clinical psychology 10.64898/2026.04.01.26349938 medRxiv

Top 0.1%

1.3%

Show abstract

Background: AI powered cognitive behavioral therapy CBT chatbots represent a scalable approach to addressing the global mental health treatment gap However causal evidence on their population level effectiveness in low and middle income countries LMICs remains limited and patient perspectives on acceptability and engagement are critical determinants of sustained use Brazils Estrategia de Saude da Familia ESF deployed an AI powered CBT chatbot Saude Mental Digital SMD to registered patients aged 18 and older at participating primary care units with eligibility determined by a composite vulnerability score exceeding a predetermined threshold Objective: To estimate the causal effect of AI powered CBT chatbot access on anxiety and depressive symptoms among primary care patients in Minas Gerais Brazil leveraging the eligibility score threshold as an exogenous source of variation Methods: We conducted a fuzzy regression discontinuity design fuzzy RDD study using linked administrative and clinical data from 312 ESF primary care units across Minas Gerais N 43287 patients January 2022 December 2024 The running variable was the composite vulnerability score with a threshold of 60 points determining chatbot eligibility The primary outcome was the 12 week change in the Patient Health Questionnaire Anxiety and Depression Scale PHQ ADS composite score Two stage least squares 2SLS estimation was used with local polynomial regression and triangular kernel weighting Bandwidth selection followed the Calonico Cattaneo Titiunik CCT optimal procedure Results: The fuzzy RDD estimated a local average treatment effect LATE of 473 points 95 CI 691 to 255 p 0001 on the PHQ ADS composite score at the eligibility threshold indicating clinically meaningful symptom reduction among compliers First stage estimates confirmed a strong 312 percentage point jump in chatbot uptake at the threshold F statistic 1274 Subgroup analyses revealed larger treatment effects among patients in rural municipalities 618 95 CI 902 to 334 those with lower educational attainment 582 95 CI 844 to 320 and women 537 95 CI 761 to 313 McCrary density tests confirmed no evidence of running variable manipulation p 067 Results were robust across alternative bandwidths polynomial orders and kernel specifications Conclusions: AI powered CBT chatbot access causally reduces anxiety and depressive symptoms among primary care patients near the eligibility threshold in Brazil with particularly pronounced benefits for rural less educated and female populations These findings provide quasi experimental evidence supporting the scalable deployment of AI powered CBT tools within public primary care systems in LMICs while underscoring the importance of incorporating patient perspectives on acceptability to maximize engagement and sustained therapeutic benefit